CTS Text Miner Text Mining Framework based on the Canonical Text Service Protocol
نویسنده
چکیده
The purpose of this paper is to describe a modular framework for text mining that uses Canonical Text Service (CTS) as a data source. By combining standardized functionalities with standardized access to text data, this framework intends to reduce the heterogeneity of workflows in today’s Digital Humanities and act as an important element of a text research infrastructure. For this work the implementation of the CTS protocol described in (Tiepmar, 2015) is used. It uses advanced functionalities that are not part of the specifications of CTS. This means that, while most current modules should work with different implementations of the CTS protocol, it cannot be guaranteed that any future module will work.
منابع مشابه
Canonical Text Services in CLARIN Reaching out to the Digital Classics and beyond
Providing both user-friendly and machine-readable interfaces to digital resources is one of the key tasks of highly integrated research infrastructures like CLARIN. The presented implementation of the Canonical Text Service Protocol CTS covers many of the associated problems, like dealing with varying levels of text granularity, persistent identification and address resolution, and simple inter...
متن کاملAn Overview of Canonical Text Services
This paper provides a comprehensive overview of Canonical Text Services (CTS) and the surrounding tools that were developed on the basis of a MySQL based implementation. As such it covers a broad set of topics including a general explanation of CTS, various software tools and a wide array of text mining techniques. The goal is to compile the relatively widespread and potentially confusing amoun...
متن کاملWeb - Based Text Mining of Hotel Customer Comments Using SAS ® Text Miner and Megaputer Polyanalyst ®
This paper presents text mining using SAS® Text Miner and Megaputer PolyAnalyst® specifically applied for hotel customer survey data, and its data management. The paper reviews current literature of text mining, and discusses features of these two text mining software packages in analyzing unstructured qualitative data in the following key steps: data preparation, data analysis, and result repo...
متن کاملA New Implementation for Canonical Text Services
This paper introduces a new implementation of the Canonical Text Services (CTS) protocol intended to be capable of handling thousands of editions. CTS was introduced for the Digital Humanities and is based on a hierarchical structuring of texts down to the level of individual words mirroring traditional practices of citing. The paper gives an overview of CTS for those that are unfamiliar and es...
متن کاملText Mining in Pharma and Intelligence
1. Profiling and classification of scientific documents with SAS Text Miner SAS Institute (www.sas.com) and the European Molecular Biology Laboratory (EMBL)/ the ELM Consortium (http://elm.eu.org) are cooperating on the development of a text mining-application for the automated identification and ranking of scientific articles. The so-called “topic scoring engine” is based on the SAS Text Miner...
متن کامل